Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

port udivmoddi4 and __aeabi_uldivmod #25

Merged
merged 11 commits into from
Aug 11, 2016
Merged

port udivmoddi4 and __aeabi_uldivmod #25

merged 11 commits into from
Aug 11, 2016

Conversation

japaric
Copy link
Member

@japaric japaric commented Aug 11, 2016

cc @thejpster

The udivmoddi4 intrinsic is tested using quickcheck.

I'm concerned about the __aeabi_uldivmod intrinsic. First, I get infinite recursion but that probably can be fixed with the hacks @Amanieu suggested in #16. The other issue is that the assembly of the __aeabi* intrinsic doesn't look like compiler-rt's assembly implementation.

This is the assembly generated by this crate (for the arm-unknown-linux-gnueabi target):

00000000 <__aeabi_uldivmod>:
   0:   e92d4010        push    {r4, lr}
   4:   e24dd010        sub     sp, sp, #16
   8:   e59dc018        ldr     ip, [sp, #24]
   c:   e59de01c        ldr     lr, [sp, #28]
  10:   e1a04000        mov     r4, r0
  14:   e28d0008        add     r0, sp, #8
  18:   e1a01003        mov     r1, r3
  1c:   e58d0000        str     r0, [sp]
  20:   e1a00002        mov     r0, r2
  24:   e1a0200c        mov     r2, ip
  28:   e1a0300e        mov     r3, lr
  2c:   ebfffffe        bl      0 <__aeabi_uldivmod>
  30:   e59d2008        ldr     r2, [sp, #8]
  34:   e59d300c        ldr     r3, [sp, #12]
  38:   e884000f        stm     r4, {r0, r1, r2, r3}
  3c:   e28dd010        add     sp, sp, #16
  40:   e8bd8010        pop     {r4, pc}

And this is compiler-rt's implementation:

        push    {r11, lr}
        sub     sp, sp, #16
        add     r12, sp, #8
        str     r12, [sp]
        bl      SYMBOL_NAME(__divmoddi4)
        ldr     r2, [sp, #8]
        ldr     r3, [sp, #12]
        add     sp, sp, #16
        pop     {r11, pc}

Even though I implemented the intrinsic in Rust following the compiler-rt's "C implementation", which is written in the comments of the source file:

struct { int64_t quot, int64_t rem}
       __aeabi_ldivmod(int64_t numerator, int64_t denominator) {
  int64_t rem, quot;
  quot = __divmoddi4(numerator, denominator, &rem);
  return {quot, rem};
}

But I used u64 because i64 doesn't really make sense (the __divmoddi4 takes unsigned integers as arguments).

@Amanieu Any idea of what's wrong with my __aeabi_uldivmod implementation? Should I implement it using assembly + naked functions instead?

@Amanieu
Copy link
Member

Amanieu commented Aug 11, 2016

It seems that __aeabi_uldivmod uses a custom calling convention, so you'll have to implement it as a naked function. Also note that your 64-bit division function makes use of 32-bit division, so you'll have to implement that as well.

@Amanieu
Copy link
Member

Amanieu commented Aug 11, 2016

You might want to rebase your implementation on top of #26 and make use of the traits.

}

#[no_mangle]
pub unsafe extern "aapcs" fn __aeabi_uldivmod(num: u64, den: u64) -> u64x2 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think unsafe is necessary here. We only needed it for the mem* builtins.

@japaric
Copy link
Member Author

japaric commented Aug 11, 2016

It seems that __aeabi_uldivmod uses a custom calling convention, so you'll have to implement it as a naked function.

Alright, thanks for the input and done.

Also note that your 64-bit division function makes use of 32-bit division, so you'll have to implement that as well.

Just ported that.

@japaric
Copy link
Member Author

japaric commented Aug 11, 2016

You might want to rebase your implementation on top of #26 and make use of the traits.

Nice. I'll get to it tomorrow.

@@ -295,3 +295,64 @@ pub extern "C" fn __udivmoddi4(a: u64, b: u64, rem: *mut u64) -> u64 {
}
q.u64()
}

#[no_mangle]
pub extern "C" fn __udivmodsi4(a: u32, b: u32, rem: *mut u32) -> u32 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Option<&mut u32>

@japaric
Copy link
Member Author

japaric commented Aug 11, 2016

First, I get infinite recursion but that probably can be fixed with the hacks

So, even though objdumping this crate shows infinite recursion, when I use this crate in other project to produce executables the infinite recursion disappears e.g.:

0800013c <__aeabi_uidivmod>:
 800013c:       b500            push    {lr}
 800013e:       b081            sub     sp, #4
 8000140:       466a            mov     r2, sp
 8000142:       f000 f92b       bl      800039c <__udivmodsi4>
 8000146:       9900            ldr     r1, [sp, #0]
 8000148:       b001            add     sp, #4
 800014a:       bd00            pop     {pc}

0800014c <__aeabi_uldivmod>:
 800014c:       e92d 4800       stmdb   sp!, {fp, lr}
 8000150:       b084            sub     sp, #16
 8000152:       f10d 0c08       add.w   ip, sp, #8
 8000156:       f8cd c000       str.w   ip, [sp]
 800015a:       f000 f805       bl      8000168 <__udivmoddi4>
 800015e:       9a02            ldr     r2, [sp, #8]
 8000160:       9b03            ldr     r3, [sp, #12]
 8000162:       b004            add     sp, #16
 8000164:       e8bd 8800       ldmia.w sp!, {fp, pc}

08000168 <__udivmoddi4>:
 8000168:       e92d 4ff0       stmdb   sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
(...)

That's with LTO. Without LTO, I get pretty much the same.

@Amanieu
Copy link
Member

Amanieu commented Aug 11, 2016

Yes, that's to be expected. Objdumping object files sometimes has issues with symbol resolution. Objdumping an executable should always show the correct symbols that are getting linked.

@@ -1,3 +1,35 @@
use core::intrinsics;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment explaining why an asm implementation was necessary (non-standard calling convention) would be nice here.

@japaric
Copy link
Member Author

japaric commented Aug 11, 2016

@Amanieu I think I addressed all your comments. Let me know if I missed anything.

if let Some(rem) = rem {
*rem = u64::from(n.high() % d.low());
}
return u64::from(n.high() / d.low());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think we could just go with panic!("Division by zero") here.

}

q = (q << 1) | carry;
q
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just return (q << 1) | carry directly

@Amanieu
Copy link
Member

Amanieu commented Aug 11, 2016

I think that covers everything, it should be good to go after those changes.

@japaric japaric merged commit 4ab4465 into master Aug 11, 2016
@japaric japaric deleted the udivmoddi4 branch August 11, 2016 07:08
@japaric
Copy link
Member Author

japaric commented Aug 11, 2016

Merging (test failures (ARM) are known to be flaky)

Thanks @Amanieu for the review!

@thejpster I hope you can give this crate another try! Let us know if you need other intrinsics!

@thejpster
Copy link

thejpster/stellaris-launchpad@3c901c2

Looks good to me. I reported a failure earlier, but I'd forgotten to extern create rustc_builtins!

@Amanieu
Copy link
Member

Amanieu commented Aug 11, 2016

I don't think you need extern crate, you should only need to add the dependency to Cargo.toml.

@thejpster
Copy link

thejpster commented Aug 11, 2016

Should no_compiler_rt be true or false?

@Amanieu
Copy link
Member

Amanieu commented Aug 11, 2016

true

@thejpster
Copy link

If I don't extern crate I get:

note: /home/jonathan/Documents/programming/rust/bare-metal-arm-rust/target/lm4f120/debug/deps/libprimer.rlib(primer.0.o): In function `primer::lm4f120h5qr::systick::run_time_us':
/home/jonathan/Documents/programming/rust/bare-metal-arm-rust/libs/primer/src/lm4f120h5qr/systick.rs:171: undefined reference to `__aeabi_uldivmod'
collect2: error: ld returned 1 exit status

@Amanieu
Copy link
Member

Amanieu commented Aug 11, 2016

Oh, I guess it is required then. My bad.

@japaric
Copy link
Member Author

japaric commented Aug 11, 2016

@thejpster

Looks good to me. I reported a failure earlier, but I'd forgotten to extern create rustc_builtins!

Nice! Experienced any significant change in binary size after moving to rustc_builtins?

@Amanieu
Copy link
Member

Amanieu commented Aug 11, 2016

I would expect it to be slightly slower since we aren't using the asm implementation of division from compiler-rt and instead use a rust implementation.

@thejpster
Copy link

thejpster commented Aug 11, 2016 via email

@japaric
Copy link
Member Author

japaric commented Aug 11, 2016

I did some measurements on this snippet:

#[no_mangle]
pub extern "C" fn start() -> ! {
    unsafe {
        let x = ptr::read_volatile(0x0 as *const u64);
        let y = ptr::read_volatile(0x0 as *const u64);
        let z = x / y;
        ptr::write_volatile(0x0 as *mut u64, z);

        loop {}
    }
}

compiler-rt.rs (intrinsics in C/asm)

   text    data     bss     dec     hex
    976       0       0     976     3d0

This crate (intrinsics in Rust)

   text    data     bss     dec     hex
   1046       0       0    1046     416

Build using Cargo's release profile with LTO enabled for ARM Cortex-M3.

And for reference, this program:

#[no_mangle]
pub extern "C" fn start() -> ! {
    loop {}
}

has this size:

   text    data     bss     dec     hex
    294       0       0     294     126

70 bytes bigger doesn't sound that bad. If as @Amanieu says there are other optimizations we could do here then perhaps that can bring down the size on par with compiler-rt's version maybe even make it smaller(!).

@japaric japaric mentioned this pull request Aug 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants